Hilberg’s Conjecture — a Challenge for Machine Learning

نویسنده

  • Łukasz Dębowski
چکیده

We review three mathematical developments linked with Hilberg’s conjecture—a hypothesis about the power-law growth of entropy of texts in natural language, which sets up a challenge for machine learning. First, considerations concerning maximal repetition indicate that universal codes such as the Lempel-Ziv code may fail to efficiently compress sources that satisfy Hilberg’s conjecture. Second, Hilberg’s conjecture implies the empirically observed power-law growth of vocabulary in texts. Third, Hilberg’s conjecture can be explained by a hypothesis that texts describe consistently an infinite random object.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Empirical Evidence for Hilberg’s Conjecture in Single-Author Texts

Hilberg’s conjecture is a statement that the mutual information between two adjacent blocks of text in natural language scales as n , where n is the block length. Previously, this hypothesis has been linked to Herdan’s law on the levels of word frequency and of text semantics. Thus it is worth a direct empirical test. In the present paper, Hilberg’s conjecture is tested for a selection of Engli...

متن کامل

Hilberg’s Conjecture: an Updated FAQ

This note is a brief introduction to theoretical and experimental results concerning Hilberg’s conjecture, a hypothesis about natural language. The aim of the text is to provide a short guide to the literature. 1 What is Hilberg’s conjecture? In the early days of information theory, Shannon (1951) published estimates of conditional entropy for printed English. A few decades later, Hilberg (1990...

متن کامل

A Hybrid Machine Learning Method for Intrusion Detection

Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...

متن کامل

A Preadapted Universal Switch Distribution for Testing Hilberg's Conjecture

Hilberg’s conjecture states that the mutual information between two adjacent long blocks of text in natural language grows like a power of the block length. The exponent in this hypothesis can be upper bounded using the pointwise mutual information computed for a carefully chosen code. The bound is the better, the lower the compression rate is but there is a requirement that the code be univers...

متن کامل

A Preadapted Universal Switch Distribution for Testing Hilberg's Conjecture

Hilberg’s conjecture states that the mutual information between two adjacent long blocks of text in natural language grows like a power of the block length. The exponent in this hypothesis can be upper bounded using the pointwise mutual information computed for a carefully chosen code. The bound is the better, the lower the compression rate is but there is a requirement that the code be univers...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015